Model Selection

Long Audio Processing

# Long Audio Processing

Whisper Large V3 Turbo

Whisper is OpenAI's state-of-the-art automatic speech recognition (ASR) and speech translation model, trained on over 5 million hours of labeled data with strong zero-shot generalization capabilities. The Turbo version is a pruned and fine-tuned variant of the original, reducing decoder layers from 32 to 4, significantly improving speed with a slight quality trade-off.

Speech Recognition

Transformers Supports Multiple Languages

Whisper Large V3

Whisper is OpenAI's state-of-the-art automatic speech recognition (ASR) and speech translation model, supporting multiple languages

Speech Recognition

Safetensors Supports Multiple Languages

Quantum_STT is an advanced automatic speech recognition (ASR) and speech translation model, trained with large-scale weak supervision, supporting multiple languages and tasks.

Speech Recognition

Transformers Supports Multiple Languages

Whisper Large V3 Turbo Gguf

Whisper large-v3-turbo is a pruned and fine-tuned version based on Whisper large-v3, with the decoder layers reduced from 32 to 4, significantly improving speed while slightly reducing quality.

Speech Recognition Supports Multiple Languages

Whisper Small Tel

A speech recognition model fine-tuned on Telugu audio datasets based on OpenAI Whisper-large-v2

Speech Recognition

Transformers Other

Distil Large V3.5

Distil-Whisper is a knowledge-distilled version of OpenAI Whisper-Large-v3, achieving efficient speech recognition through large-scale pseudo-label training.

Speech Recognition

Transformers English

Whisper Large V3 Turbo Common Voice 19 0 Zh TW

A fine-tuned Traditional Chinese (Taiwan) automatic speech recognition model based on OpenAI Whisper-large-v3-turbo

Speech Recognition

Transformers Chinese

Whisper Large V3 Turbo

Whisper is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI, trained on over 5 million hours of labeled data, demonstrating strong generalization capabilities in zero-shot settings.

Speech Recognition

Transformers Supports Multiple Languages

Kotoba Whisper V2.0 Faster

A Whisper speech recognition model optimized for CTranslate2, specifically tailored for Japanese, providing efficient speech-to-text functionality.

Speech Recognition Japanese

Audio Transcribe

This is a Transformer-based Automatic Speech Recognition (ASR) model for transcribing audio files into text.

Speech Recognition

Distil Small.en

Distil-Whisper is a distilled version of the Whisper model, 6x faster with 49% smaller size, achieving near 1% WER on out-of-distribution evaluation sets.

Speech Recognition

Transformers English

Whisper Large V3 German

A fine-tuned German speech recognition model based on Whisper Large v3, optimized for German speech processing and recognition

Speech Recognition

Transformers German

Distil Medium.en

Distil-Whisper is a distilled version of the Whisper model, 6 times faster than the original, with a 49% reduction in size, while maintaining performance close to the original in English speech recognition tasks.

Speech Recognition English

Distil Large V2

Distil-Whisper is a distilled version of the Whisper model, achieving 6x speedup and 49% size reduction with only a 1% WER difference on out-of-distribution evaluation sets.

Speech Recognition English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase